-
Notifications
You must be signed in to change notification settings - Fork 2k
[BUG] Idempotent creates of attached functions fail. #5954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
|
Handle idempotent attached function creation across coordinator and DAO The PR fixes coordinator idempotency when re-attaching functions by ensuring the DAO can retrieve both ready and pending attached functions and by tolerating duplicate insert attempts. A new distributed integration test reproduces the attach-detach-attach sequence to prevent regressions, and Go mocks/interfaces are updated accordingly. Key Changes• Add Affected Areas• This summary was automatically generated by @propel-code-bot |
| def test_count_function_attach_and_detach_attach_attach(basic_http_client: System) -> None: | ||
| """Test creating and removing a function with the record_counter operator""" | ||
| client = ClientCreator.from_system(basic_http_client) | ||
| client.reset() | ||
|
|
||
| # Create a collection | ||
| collection = client.get_or_create_collection( | ||
| name="my_document", | ||
| metadata={"description": "Sample documents for task processing"}, | ||
| ) | ||
|
|
||
| # Create a task that counts records in the collection | ||
| attached_fn = collection.attach_function( | ||
| name="count_my_docs", | ||
| function_id="record_counter", # Built-in operator that counts records | ||
| output_collection="my_documents_counts", | ||
| params=None, | ||
| ) | ||
|
|
||
| # Verify task creation succeeded | ||
| assert attached_fn is not None | ||
| initial_version = get_collection_version(client, collection.name) | ||
|
|
||
| # Add documents | ||
| collection.add( | ||
| ids=["doc_{}".format(i) for i in range(0, 300)], | ||
| documents=["test document"] * 300, | ||
| ) | ||
|
|
||
| # Verify documents were added | ||
| assert collection.count() == 300 | ||
|
|
||
| wait_for_version_increase(client, collection.name, initial_version) | ||
| # Give some time to invalidate the frontend query cache | ||
| sleep(60) | ||
|
|
||
| result = client.get_collection("my_documents_counts").get("function_output") | ||
| assert result["metadatas"] is not None | ||
| assert result["metadatas"][0]["total_count"] == 300 | ||
|
|
||
| # Remove the task | ||
| success = attached_fn.detach( | ||
| delete_output_collection=True, | ||
| ) | ||
|
|
||
| # Verify task removal succeeded | ||
| assert success is True | ||
|
|
||
| # Create a task that counts records in the collection | ||
| attached_fn = collection.attach_function( | ||
| name="count_my_docs", | ||
| function_id="record_counter", # Built-in operator that counts records | ||
| output_collection="my_documents_counts", | ||
| params=None, | ||
| ) | ||
| assert attached_fn is not None | ||
|
|
||
| # Create a task that counts records in the collection | ||
| attached_fn = collection.attach_function( | ||
| name="count_my_docs", | ||
| function_id="record_counter", # Built-in operator that counts records | ||
| output_collection="my_documents_counts", | ||
| params=None, | ||
| ) | ||
| assert attached_fn is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Maintainability] [CodeDuplication] The new test test_count_function_attach_and_detach_attach_attach is very similar to the existing test_count_function_attach_and_detach test in the same file. A significant portion of the code, including setting up the collection, attaching the function, adding data, and verifying the initial run, is duplicated.
To improve maintainability and reduce redundancy, consider refactoring the common setup and execution logic into a helper function that both tests can call.
Context for Agents
[CodeDuplication] The new test `test_count_function_attach_and_detach_attach_attach` is very similar to the existing `test_count_function_attach_and_detach` test in the same file. A significant portion of the code, including setting up the collection, attaching the function, adding data, and verifying the initial run, is duplicated.
To improve maintainability and reduce redundancy, consider refactoring the common setup and execution logic into a helper function that both tests can call.
File: chromadb/test/distributed/test_task_api.py
Line: 296| err = s.catalog.metaDomain.AttachedFunctionDb(txCtx).Insert(attachedFunction) | ||
| if err != nil { | ||
| if err == common.ErrAttachedFunctionAlreadyExists { | ||
| // idempotent fall through |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we verify that the nonready function that exists actually matches the currently requested function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there a bunch of code above that does this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does that only for functions that are ready. You might be able to reuse the above checks by changing GetByCollectionID to a function called GetAnyByCollectionID that returns deleted (consistency with other GetAny methods) and non-ready functions.
| var attachedFunctions []*dbmodel.AttachedFunction | ||
| err := s.db. | ||
| Where("input_collection_id = ?", inputCollectionID). | ||
| Where("is_deleted = ?", false). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically all the other GetAny methods return deleted functions too so this is an inconsistency. I can add this to my to-clean-up list for cleanup day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning "Deleted" breaks this case. What would you call between GetBy and GetAnyBy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've renamed it ReadyOrNotReady and documented.
| existingAttachedFunctions, err := s.catalog.metaDomain.AttachedFunctionDb(txCtx).GetAnyByCollectionID(req.InputCollectionId) | ||
| if err != nil { | ||
| log.Error("AttachFunction: failed to check for existing attached function", zap.Error(err)) | ||
| return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Re: line +102]
Ok now this error is wrong. If you have more than one ready attached function that's a problem. But it could be possible to have a bunch of partially attached nonready function.
See this comment inline on Graphite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would we allow that? That seems like a buggy state to allow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if they never become ready and are the cumulative result of multiple failed backfill requests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like we're letting our invariants go loose. If there can only be one, better not make them fight for that distinction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invariant is that there can only be one ready function, that reminds me that I need to add validation for that before making a function ready in the commit phase of AttachFunction 2PC.
This comment has been minimized.
This comment has been minimized.
81e1ce2 to
6f1db5f
Compare
Idempotent creates of attached functions fail because we check the error and then throw an error anyway. This PR adds a carve out for that error condition and adds a test of the case that got us into the state where this was noticed. retrieve any (ready/not-ready) task Rename ReadyOrNotReady
6f1db5f to
6882911
Compare
This comment has been minimized.
This comment has been minimized.
| existingAttachedFunctions, err := s.catalog.metaDomain.AttachedFunctionDb(txCtx).GetReadyOrNotReadyByCollectionID(req.InputCollectionId) | ||
| if err != nil { | ||
| log.Error("AttachFunction: failed to check for existing attached function", zap.Error(err)) | ||
| return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Re: line +125]
This line of code should be doing what your change down below intends on doing. I wonder with your addition of GetReadyOrNotReadyByCollectionID whether the right thing to do now is to get rid of the fall-through on line 198.
We also now need to make sure line 121 doesn't error out too early if it happens to read a non-matching unready function first.
See this comment inline on Graphite.
Description of changes
Idempotent creates of attached functions fail because we check the error
and then throw an error anyway. This PR adds a carve out for that error
condition and adds a test of the case that got us into the state where
this was noticed.
Test plan
CI
Migration plan
N/A
Observability plan
N/A
Documentation Changes
N/A